4 - Compositional Analysis

Author

CDN team

Published

March 12, 2024

Introduction

In this notebook we will study which cell types compose different samples in single-cell data. We will do this using a subset of statistics called compositional data analysis (or CoDA for short), because single-cell experiments are samples of a larger tissue, and the cells we obtain represents the proportions of cell types in the tissue, but does not reflect their true abundance.

CoDA relativizes proportions by comparing them to a stable population, comparing them to each other, or to a mean across samples, all resulting in a table of relative abundances of cell types per sample. This sample x species matrix is a natural foundation for conducting case-control analysis, as we have a single vector of variables per sample. Here, we can scrutinize variances between conditions and between samples.

Useful Resources

  • John Aitchison’s Compositional Data Analysis (J. Aitchison 1982)
  • scCODA, a python + scanpy package for ALR compositional analysis (Büttner et al. 2021)
  • Cacoa, an R package for case-control analysis that uses ILR
  • compositions, an R package for CoDA
  • A paper describing the compositionality problem in terms of microbiome studies (Morton et al. 2019)
  • sccomp - an R method for single-cell compositional comparisons https://github.com/stemangiola/sccomp

Key Takeaways

  • Compositions provide a good foundation for comparing individual samples
  • The compositional nature of scRNAseq means only conclusions about relative values can be made, and overall cell density can bias results
  • Compositional transforms allow quantitative assessment of scRNA abundances

Libraries

Installation

if (!requireNamespace("tidyverse", quietly = TRUE))
    install.packages('tidyverse')
if (!requireNamespace("Seurat", quietly = TRUE))
    install.packages('Seurat')
if (!requireNamespace("compositions", quietly = TRUE))
    install.packages('compositions')
if (!requireNamespace("colorBlindness", quietly = TRUE))
    install.packages('colorBlindness')
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
if (!requireNamespace("EnhancedVolcano", quietly = TRUE))
    BiocManager::install("EnhancedVolcano")
if (!requireNamespace("ComplexHeatmap", quietly = TRUE))
    BiocManager::install("ComplexHeatmap")
if (!requireNamespace("scales", quietly = TRUE))
    install.packages('scales')
if (!requireNamespace("viridis", quietly = TRUE))
    install.packages('viridis')
if (!requireNamespace("DT", quietly = TRUE))
    install.packages('DT')
if (!requireNamespace("reshape2", quietly = TRUE))
    install.packages('reshape2')
if (!requireNamespace("ggrepel", quietly = TRUE))
    install.packages("ggrepel")
if (!requireNamespace("sccomp", quietly = TRUE))
devtools::install_github("stemangiola/sccomp")

Load Libraries

library(colorBlindness)
library(tidyverse)
library(EnhancedVolcano)
library(viridis)
library(scales)
library(DT)
library(Seurat)
library(compositions)
library(reshape2)
library(ComplexHeatmap)
library(ggrepel)
library(sccomp)

Load data

srobj <- readRDS('/Users/kylekimler/Downloads/d8e35450-de43-451a-9979-276eac688bce.rds')
genes <- read_csv('/Users/kylekimler/CDN/workshops/workshop1/data/cov_flu_gene_names_table.csv') # Load a provided gene conversion table to convert ENSG to readable gene symbols

# Need to remake seurat object
mtx <- srobj@assays$RNA@data
rownames(mtx) <- genes[match(row.names(mtx),genes$index),]$feature_name

srobj <- CreateSeuratObject(counts = mtx, meta.data = srobj@meta.data)

srobj
An object of class Seurat 
33234 features across 59572 samples within 1 assay 
Active assay: RNA (33234 features, 0 variable features)
 1 layer present: counts
set.seed(687)

Other setup

Generate a color palette for plotting

# https://www.datanovia.com/en/blog/easy-way-to-expand-color-palettes-in-r/
# nb.cols <- length(unique(se$Celltype))
# mycolors <- colorRampPalette(paletteMartin)(nb.cols)
pal <- paletteMartin
names(pal) <- sort(unique(srobj$Celltype))

Composition of clusters per sample

In papers we often see percentages of clusters per sample or of subclusters in a larger celltype. These are done using stacked bar plots as follows.

celltypePercentagesDF <- srobj@meta.data %>% count(`Sample ID`,Celltype) %>% group_by(`Sample ID`) %>% reframe(Celltype, celltype_n = n, total_n_cells = sum(n)) %>% mutate(pct_celltype = celltype_n / total_n_cells)

datatable(celltypePercentagesDF)
celltypePercentagesDF %>% 
    ggplot(aes(x = `Sample ID`, y = pct_celltype, fill = Celltype), color = 'white') + 
    geom_bar(position='stack', stat='identity') + 
    scale_fill_manual(values = pal) + 
    ggtitle('Composition of each sample by celltype') +
    theme_linedraw(base_size = 20) +
    theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))

Comparing compositions with Wilcoxon tests

And often these compositions are compared using rank sum tests, which Morton et al (Morton et al. 2019) show is a great way to avoid the compositionality problem. The most common rank sum test is the Wilcoxon test, the default test Seurat uses for comparisons of genes across clusters with FindAllMarkers.

If we use the Wilcoxon test here in a comparison of compositions across samples Flu vs. Normal, we will see that the result depends on proper normalization of the data.

# Create a function to perform Wilcoxon rank-sum tests and return p-values
perform_wilcox_test <- function(data, group_col, group1, group2, species_col) {
  group1_data <- data[data[[group_col]] == group1, species_col]
  group2_data <- data[data[[group_col]] == group2, species_col]
  
  test_result <- wilcox.test(group1_data, group2_data)
  return(data.frame(
      p.val = test_result$p.value,
      statistic = test_result$statistic,
      species = species_col, 
      test = paste(group1,'vs',group2)))
}

rawTb <- celltypePercentagesDF %>%
     dplyr::select(sample=`Sample ID`, Celltype, celltype_n) %>%
     pivot_wider(names_from=Celltype, values_from= celltype_n, values_fill=0) %>%
     separate(sample, sep = " ", into=c('group','sample_n'), remove=FALSE) %>% 
     column_to_rownames("sample") %>%
     select(-sample_n)

# Create a long dataframe of raw values for comparisons
raw_long <- rawTb %>% 
  pivot_longer(-group, names_to="species", values_to="counts") %>%
  mutate(transform = "Raw Counts")

species_col_names <- colnames(rawTb)[2:ncol(rawTb)]

wilcoxon_raw <- lapply(species_col_names, function(species) 
    perform_wilcox_test(rawTb,
        group_col = "group",
        group1 = 'Flu',
        group2 = 'Normal',
        species_col = species))

wilcoxon_raw <- wilcoxon_raw %>%
    bind_rows %>% 
    data.frame %>% 
    mutate(transform = "Raw")

pctTb <- celltypePercentagesDF %>% 
            dplyr::select(sample=`Sample ID`, Celltype, pct_celltype) %>% 
            pivot_wider(names_from=Celltype, values_from=pct_celltype, values_fill=0) %>% 
            separate(sample, sep = " ", into=c('group','sample_n'), remove=FALSE) %>% 
            column_to_rownames("sample") %>% 
            select(-sample_n) # reorganize dataframe for wilcoxon tests

# Create a long dataframe of percent values for comparisons
pct_long <- pctTb %>% 
  pivot_longer(-group, names_to="species", values_to="counts") %>%
  mutate(transform = "Percentages")

wilcoxon_pct <- lapply(species_col_names, function(species) 
                        perform_wilcox_test(pctTb, 
                                            group_col = "group", 
                                            group1 = 'Flu', 
                                            group2 = 'Normal', 
                                            species_col = species)) 

wilcoxon_pct <- wilcoxon_pct %>% bind_rows %>% data.frame %>% mutate(transform = "Percent")

combined_data <- bind_rows(raw_long, pct_long)

combined_data %>%
    filter(group %in% c('Flu','Normal')) %>%
    ggplot(aes(x = species, y=counts, fill = group)) +
  geom_boxplot() +
  facet_wrap(~ transform, scales = "free_y") +
  labs(title = "Comparison of Species Counts per Group",
       x = "Species",
       y = "Counts / Percentages") +
  theme_minimal(base_size = 20) +
  theme(axis.text.x = element_text(angle=45, hjust=1),
        legend.title = element_blank()) +
  scale_fill_manual(values = unname(pal[c(1,4)]))

wilcoxon_comparisons <- bind_rows(wilcoxon_raw,wilcoxon_pct)

ggplot(wilcoxon_comparisons, aes(x = reorder(species, p.val), y = p.val, color = p.val < 0.05)) +
  geom_point() + # Use geom_bar() for bar plots
  scale_color_manual(values = c("TRUE" = "red", "FALSE" = "blue"), name = "p-val < 0.05") +
  facet_wrap(~ transform, scales = "free_x") +
  labs(x = "Species", title = "Comparison of P-Values Across Species and Transformations") +
  theme_minimal(base_size = 20) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        legend.position = "top") +
  guides(fill = guide_legend(title = "P-value < 0.05"))

Compositional transforms

The compositions package provides compositional transforms that allow quantitative comparisons of compositional data. In the most popular package for compositional analysis in scRNA, scCODA, the additive-log-ratio is used, where compositions are transformed to log-ratios of the cluster with the least dispersion that is present across 95% of all samples.

\text{ALR}(x_i) = \log\left(\frac{x_i}{x_D}\right)

where:

  • \log denotes the natural logarithm,
  • x_i is the value of component i in the composition,
  • x_D is the chosen reference component from the compositional dataset.

\text{CLR}(x_i) = \log\left(\frac{x_i}{g(x)}\right)

where:

  • \log denotes the natural logarithm,
  • x_i is the value of component i in the composition,
  • g(x) is the geometric mean of all components in the composition, calculated as g(x) = \left(\prod_{i=1}^{D} x_i\right)^{\frac{1}{D}}, with D being the total number of components in the composition. \prod_{} means multiply each x_i, so you can think of it as an extension of the pythagorean theorem to any n components

The CLR is very easy to use in practice because it does not require a reference. In datasets with very noisy cell subtypes or very different samples, it can be better than the ALR, which can fail when there isn’t a good reference cluster. For this notebook, we can calculate the ALR in a similar way to what scCODA (the first CODA package built into scanpy) does by choosing the reference based on which cluster has minimal dispersion and at least 95% presence across samples

compTb <- pctTb

clrTb <- compositions::clr(compTb[,-1]) %>% data.frame
colnames(clrTb) <- species_col_names
clrTb$group <- compTb$group 

# Create a long dataframe of raw values for comparisons
clr_long <- clrTb %>% 
  pivot_longer(-group, names_to="species", values_to="counts") %>%
  mutate(transform = "CLR")

wilcoxon_clr <- lapply(species_col_names, function(species) 
                        perform_wilcox_test(clrTb, 
                                            group_col = "group", 
                                            group1 = 'Flu', 
                                            group2 = 'Normal', 
                                            species_col = species))

wilcoxon_clr <- wilcoxon_clr %>% bind_rows %>% data.frame %>% mutate(transform = "CLR")

presence_threshold <- 0.95
presence <- apply(compTb > 0, 2, function(x) mean(x)) > presence_threshold
# make sure group column isn't included
presence[1] <- FALSE

iqr_values <- apply(compTb[, presence], 2, IQR)
denominator_index <- which.min(iqr_values)
denominator_name <- names(iqr_values[denominator_index])

alrTb <- compositions::alr(rawTb[,-1] + 1, # apply a tiny pseudocount to avoid logarithmizing a 0
                           ivar = denominator_name) %>% #ivar is the invariant variable
         data.frame

alr_col_names <- species_col_names[species_col_names!=denominator_name]
colnames(alrTb) <- alr_col_names
alrTb$group <- compTb$group 

alr_long <- alrTb %>% 
  pivot_longer(-group, names_to="species", values_to="counts") %>%
  mutate(transform = "ALR")

combined_data <- bind_rows(combined_data, clr_long, alr_long)

ggplot(combined_data %>% filter(group %in% c('Flu','Normal')), 
  aes(x = species, y=counts, fill = group)) +
  geom_boxplot() +
  labs(title = "Comparison of Species Counts per Group",
       x = "Species",
       y = "values") +
  theme_minimal(base_size = 20) +
  theme(axis.text.x = element_text(angle=45, hjust=1),
        legend.title = element_blank()) +
  scale_fill_manual(values = unname(pal[c(1,4)])) +
  facet_wrap(~ transform, scales = "free_y")

wilcoxon_alr <- lapply(alr_col_names, function(species) 
                        perform_wilcox_test(pctTb, 
                        group_col = "group", 
                        group1 = 'Flu', 
                        group2 = 'Normal', 
                        species_col = species))

wilcoxon_alr <- wilcoxon_alr %>% bind_rows %>% data.frame %>% mutate(transform = "ALR")

wilcoxon_comparisons <- bind_rows(wilcoxon_comparisons,wilcoxon_clr,wilcoxon_alr)
ggplot(wilcoxon_comparisons, aes(x = reorder(species, p.val), y = p.val, color = p.val < 0.05)) +
  geom_point() + 
  scale_color_manual(values = c("TRUE" = "red", "FALSE" = "blue"), name = "p-val < 0.05") +
  facet_wrap(~ transform, scales = "free_x") +
  labs(x = "Species", title = "Comparison of P-Values Across Species and Transformations") +
  theme_minimal(base_size = 20) +
  theme(
      axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5),
      legend.position = "top") +
  guides(fill = guide_legend(title = "P-value < 0.05"))

We can see that calculating rank sum differences across samples can produce a lot of false positives if we don’t normalize our data properly. But even with mere normalization to overall cell number, the Wilcoxon shows similar results to ALR. This is because the ranks of cell population proportions do not change much with normalization. If we used a Bayesian or parametric test, we would find more false positives with the percentages than with the compositional transforms.

The “best practice” method that has been shown to be successful for compositional analysis in microbiome (ANOVA-Like Differential Expression (ALDEx) Analysis for Mixed Population RNA-Seq PLOS ONE n.d.) and single-cell data (Mangiola et al. 2023) is composition-transform paired with binomial distribution modeling ANOVA tests. sccomp is available on github

Estimating a size effect in compositional data

We can see that calculating rank sum differences across samples can produce a lot of false positives if we don’t normalize our data properly. But even with mere normalization to overall cell number, the Wilcoxon shows similar results to ALR. This is because the ranks of cell population proportions do not change much with normalization. If we used a Bayesian or parametric test, we would find more false positives with the percentages than with the compositional transforms.

The “best practice” method that has been shown to be successful for compositional analysis in microbiome (ANOVA-Like Differential Expression (ALDEx) Analysis for Mixed Population RNA-Seq PLOS ONE n.d.) and single-cell data (Mangiola et al. 2023) is composition-transform paired with binomial distribution modeling ANOVA tests. sccomp is available on github

srobj$sample <- srobj$`Sample ID` %>% str_replace_all(' ','')

srobj$disease <- srobj$`Sample ID` %>% str_replace_all("\\ .*","")

sccomp_est <- subset(srobj, disease %in% c('Flu','Normal')) %>%
                  sccomp_estimate( 
                        formula_composition = ~ 0 + disease, 
                        formula_variability = ~ 0 + disease,
                        .sample = sample,
                        .cell_group = Celltype, 
                        bimodal_mean_variability_association = TRUE,
                        cores = 4
                  )

# Run post-model processing
sccomp_res <- sccomp_est %>%
                  sccomp_remove_outliers() %>%
                  sccomp_test(contrasts = "diseaseFlu - diseaseNormal",
                  test_composition_above_logit_fold_change = 0.2)

The plotting methods don’t work well in the sccomp package. I recommend using the data to make plots directly. A familiar plot for differential tests is the volcano plot.

EnhancedVolcano(sccomp_res, 
                x = "c_effect", 
                y = "c_FDR", 
                lab = sccomp_res$Celltype, 
                pCutoff = 0.05, 
                title = "sccomp differential composition Flu vs. Normal", 
                subtitle = "p-val is FDR. FDR cutoff = 0.05; Flu positive, Normal negative")

In sccomp, a differential composition “c” and differential variability “v” are both calculated between groups. We can plot each of these separately as well

sccomp_res <- sccomp_res %>%
  mutate(signif = ifelse(c_FDR < 0.05, "FDR < 0.05", "FDR >= 0.05")) %>% arrange(c_effect)

ggplot(sccomp_res, aes(y = factor(Celltype,levels=Celltype), x = c_effect)) +
  geom_segment(aes(y = Celltype, 
                   yend = Celltype, 
                   x = c_lower, 
                   xend = c_upper, 
                   color = signif), 
                   size = 1) +
  geom_point(size = 3) + # Add point at c_effect
  scale_color_manual(values = c("FDR < 0.05" = "red3", "FDR >= 0.05" = "black")) +
  theme_minimal(base_size = 16) + 
  labs(title = "Composition test intervals with FDR",
       x = "Log-fold-change",
       y = "Celltype") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        legend.title = element_blank()) # Improve x-axis labels readability

sccomp_res <- sccomp_res %>%
  mutate(signif = ifelse(v_FDR < 0.05, "FDR < 0.05", "FDR >= 0.05")) %>% arrange(v_effect)

ggplot(sccomp_res, aes(y = factor(Celltype,levels=Celltype), x = v_effect)) +
  geom_segment(aes(y = Celltype, 
                   yend = Celltype, 
                   x = v_lower, 
                   xend = v_upper, 
                   color = signif), 
                   size = 1) +
  geom_point(size = 3) + # Add point at c_effect
  scale_color_manual(values = c("FDR < 0.05" = "red3", "FDR >= 0.05" = "black")) +
  theme_minimal(base_size = 16) + 
  labs(title = "Variability test intervals with FDR",
       x = "Difference in Variance",
       y = "Celltype") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        legend.title = element_blank()) # Improve x-axis labels readability

When there are many cell states in a dataset, for example in a large dataset, it can be nice to make a volcano plot of results for interpretability, so here we include code for creating a nice looking one based on sccomp results

Comparing samples with compositions

PCA

comp_ls <- list("alr" = alrTb, "clr" = clrTb, "raw" = rawTb, "pct" = pctTb)

# Run PCA on each transformation
pca_ls <- lapply(names(comp_ls), function(i) {
    PCA <- comp_ls[[i]] %>%
        dplyr::select(-group) %>%
        prcomp()
    })

names(pca_ls) <- names(comp_ls)

# Extract data of interest from each transformation
pcaDF <- lapply(names(pca_ls), function(i) {
    # Return dataframe
                data.frame(
                    pca_ls[[i]]$x,
                    group = comp_ls[[i]]$group,
                    transform = i)
                }) %>% 
    bind_rows() %>%
    rownames_to_column("sample_id") %>%
    mutate(
        sample_id = str_replace_all(sample_id, '\\.\\.\\.(.*)', ''))


ggplot(pcaDF, aes(x = PC1, y = PC2, color = group, label = sample_id)) +
  geom_point(size=4) + # Use geom_bar() for bar plots 
  geom_text_repel() +
  facet_wrap(~ transform, scales = "free") +
  labs(title = "PCA Across Transformations") +
  theme_minimal(base_size = 20) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        legend.position = "bottom") +
  guides(fill = guide_legend(title = "P-value < 0.05")) +
  scale_color_manual(values = unname(pal[c(4,6,9)]))

loadingsDF <- lapply(names(pca_ls), function(i) {
                    data.frame(
                        pca_ls[[i]]$rotation,
                        transform = i)
                    }) %>%
    bind_rows() %>%
    rownames_to_column('species') %>%
    group_by(transform) %>%
    arrange(PC1)


loadingsDF %>%
    ggplot(
        aes(x = PC1, y = factor(species, levels = loadingsDF$species),
        fill = PC1 < 0)) +
    geom_bar(stat = 'identity') + 
    facet_wrap(.~transform, scales = 'free') +
    theme_linedraw(base_size = 20) +
    scale_fill_manual(values = unname(pal[c(6, 11)])) #+

    # scale_y_discrete(labels = function(x) str_replace_all(x, '\\.\\.\\.(.*)',''))

Distances between samples

We can also directly calculate distances between samples based on sample composition space. This can give us a better idea about outliers and how samples group together. We can also use any of the clustering metrics to test coherence of groups.

In compositional analysis, the method for calculating distances between samples proposed by John Aitchison (John Aitchison 2005) remains the a popular method. Termed Aitchison Distance, it is the Euclidean distance between samples based on CLR transformed counts of species.

clr_distances <- dist(clrTb[,-16], method = "euclidean")

distdf <- reshape2::melt(as.matrix(clr_distances), varnames = c("from", "to"))

datatable(distdf)
distlong <- distdf %>% separate(from,c('group1','sample1'),sep=' ', remove=FALSE) %>% 
                       separate(to,c('group2','sample2'),sep=' ', remove=FALSE)
            
distlong %>% filter(sample1 !=sample2) %>% 
             mutate(distance_group = ifelse(group1==group2, 
                                            'between same disease individuals',
                                     ifelse(group1 == 'Normal' | group2 == 'Normal', 
                                            'between disease and healthy', 
                                            'between Covid and Flu'))) %>% 
    ggplot(aes(x=value,fill=distance_group)) + geom_density(alpha=0.4) + 
    ggtitle('COVID-Flu study distances\nAitchison Distances') +
    theme_bw(base_size = 20) + scale_fill_manual(values = unname(pal[c(3,6,8)])) + xlab('euclidean distance')

ComplexHeatmap::Heatmap(as.matrix(clr_distances))

Session Info

sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.4

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] sccomp_1.7.6           ComplexHeatmap_2.18.0  reshape2_1.4.4         compositions_2.0-8     Seurat_5.0.2           SeuratObject_5.0.1     sp_2.1-3               DT_0.32                scales_1.3.0           viridis_0.6.5          viridisLite_0.4.2      EnhancedVolcano_1.20.0 ggrepel_0.9.5          lubridate_1.9.3        forcats_1.0.0          stringr_1.5.1          dplyr_1.1.4            purrr_1.0.2            readr_2.1.5            tidyr_1.3.1            tibble_3.2.1           ggplot2_3.5.0          tidyverse_2.0.0        colorBlindness_0.1.9   BiocManager_1.30.22    knitr_1.45            

loaded via a namespace (and not attached):
  [1] RcppAnnoy_0.0.22            splines_4.3.1               later_1.3.2                 bitops_1.0-7                polyclip_1.10-6             fastDummies_1.7.3           lifecycle_1.0.4             StanHeaders_2.32.6          doParallel_1.0.17           vroom_1.6.5                 globals_0.16.3              lattice_0.22-5              MASS_7.3-60.0.1             crosstalk_1.2.1             magrittr_2.0.3              sass_0.4.8                  plotly_4.10.4               rmarkdown_2.26              jquerylib_0.1.4             yaml_2.3.8                  httpuv_1.6.14               sctransform_0.4.1           spam_2.10-0                 pkgbuild_1.4.3              spatstat.sparse_3.0-3       reticulate_1.35.0           cowplot_1.1.3               pbapply_1.7-2               bayesm_3.1-6                RColorBrewer_1.1-3          abind_1.4-5                 zlibbioc_1.48.0             Rtsne_0.17                  GenomicRanges_1.54.1        RCurl_1.98-1.14             BiocGenerics_0.48.1         tensorA_0.36.2.1            circlize_0.4.16             GenomeInfoDbData_1.2.11     IRanges_2.36.0              S4Vectors_0.40.2            inline_0.3.19              
 [43] irlba_2.3.5.1               listenv_0.9.1               spatstat.utils_3.0-4        goftest_1.2-3               RSpectra_0.16-1             spatstat.random_3.2-3       fitdistrplus_1.1-11         parallelly_1.37.1           DelayedArray_0.28.0         leiden_0.4.3.1              codetools_0.2-19            tidyselect_1.2.0            shape_1.4.6.1               farver_2.1.1                matrixStats_1.2.0           stats4_4.3.1                spatstat.explore_3.2-6      jsonlite_1.8.8              GetoptLong_1.0.5            ellipsis_0.3.2              progressr_0.14.0            ggridges_0.5.6              survival_3.5-8              iterators_1.0.14            foreach_1.5.2               tools_4.3.1                 ica_1.0-3                   Rcpp_1.0.12                 glue_1.7.0                  SparseArray_1.2.4           gridExtra_2.3               xfun_0.42                   MatrixGenerics_1.14.0       GenomeInfoDb_1.38.7         withr_3.0.0                 loo_2.7.0                   fastmap_1.1.1               boot_1.3-30                 fansi_1.0.6                 digest_0.6.34               timechange_0.3.0            R6_2.5.1                   
 [85] mime_0.12                   gridGraphics_0.5-1          colorspace_2.1-0            scattermore_1.2             tensor_1.5                  spatstat.data_3.0-4         utf8_1.2.4                  generics_0.1.3              data.table_1.15.2           robustbase_0.99-2           S4Arrays_1.2.1              httr_1.4.7                  htmlwidgets_1.6.4           uwot_0.1.16                 pkgconfig_2.0.3             gtable_0.3.4                lmtest_0.9-40               SingleCellExperiment_1.24.0 XVector_0.42.0              htmltools_0.5.7             dotCall64_1.1-1             clue_0.3-65                 Biobase_2.62.0              png_0.1-8                   tzdb_0.4.0                  rjson_0.2.21                curl_5.2.1                  nlme_3.1-164                cachem_1.0.8                zoo_1.8-12                  GlobalOptions_0.1.2         KernSmooth_2.23-22          parallel_4.3.1              miniUI_0.1.1.1              pillar_1.9.0                vctrs_0.6.5                 RANN_2.6.1                  promises_1.2.1              xtable_1.8-4                cluster_2.1.6               evaluate_0.23               magick_2.8.3               
[127] cli_3.6.2                   compiler_4.3.1              rlang_1.1.3                 crayon_1.5.2                rstantools_2.4.0            future.apply_1.11.1         labeling_0.4.3              plyr_1.8.9                  rstan_2.32.6                stringi_1.8.3               QuickJSR_1.1.3              deldir_2.0-4                munsell_0.5.0               lazyeval_0.2.2              spatstat.geom_3.2-9         V8_4.4.2                    Matrix_1.6-5                RcppHNSW_0.6.0              hms_1.1.3                   patchwork_1.2.0             bit64_4.0.5                 future_1.33.1               shiny_1.8.0                 SummarizedExperiment_1.32.0 ROCR_1.0-11                 igraph_2.0.2                bslib_0.6.1                 RcppParallel_5.1.7          bit_4.0.5                   DEoptimR_1.1-3             

References

Aitchison, J. 1982. “The Statistical Analysis of Compositional Data.” Journal of the Royal Statistical Society. Series B (Methodological) 44 (2): 139–77. https://www.jstor.org/stable/2345821.
Aitchison, John. 2005. “A Concise Guide to Compositional Data Analysis.” Research {Reports} or {Papers}. http://web.archive.org/web/20210423114827/http://ima.udg.edu/Activitats/CoDaWork05/.
ANOVA-Like Differential Expression (ALDEx) Analysis for Mixed Population RNA-Seq PLOS ONE.” n.d. Accessed March 12, 2024. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0067019.
Büttner, M., J. Ostner, C. L. Müller, F. J. Theis, and B. Schubert. 2021. scCODA Is a Bayesian Model for Compositional Single-Cell Data Analysis.” Nature Communications 12 (1): 6876. https://doi.org/10.1038/s41467-021-27150-6.
Mangiola, Stefano, Alexandra J. Roth-Schulze, Marie Trussart, Enrique Zozaya-Valdés, Mengyao Ma, Zijie Gao, Alan F. Rubin, Terence P. Speed, Heejung Shim, and Anthony T. Papenfuss. 2023. “Sccomp: Robust Differential Composition and Variability Analysis for Single-Cell Data.” Proceedings of the National Academy of Sciences 120 (33): e2203828120. https://doi.org/10.1073/pnas.2203828120.
Morton, James T., Clarisse Marotz, Alex Washburne, Justin Silverman, Livia S. Zaramela, Anna Edlund, Karsten Zengler, and Rob Knight. 2019. “Establishing Microbial Composition Measurement Standards with Reference Frames.” Nature Communications 10 (1): 2719. https://doi.org/10.1038/s41467-019-10656-5.